Understanding the workings of the human mind is one of the great
scientific frontiers of our time. One of the few paths into the brain
is the auditory system, and discovering the boundaries between
auditory cognition, perception, and the signals that arrive at our ears is a
way to probe at the edges of our understanding. Building
models that try to mimic particular human abilities is a great way to
proceed: when the models are successful they lead to better algorithms
and to new applications. When the models fail they point to places where
deeper understanding is needed. Studying the rhythmic aspects of
music is one piece of this larger puzzle.
Three important aspects of rhythmic phenomena are its nonverbal
nature, its relationship with motor activity, and its relationship
with time. Rhythmic knowledge is nonverbal, yet operates in a hierarchical,
multi-tiered fashion analogous to language with "notes" instead
of "phonemes" and "musical phrases" instead of "sentences."
Rhythmic phenomena express a kind of meaning that is difficult to
express in words - just as words express a kind of meaning that is
difficult to express in rhythm.
Second, rhythmic activities are closely tied into the motor system, and there
is an interplay between kinesthetic "meaning" and "memory" and
other kinds of meaning and memory. From the work song to the
dance floor, the synchronization of activities
is a common theme in human interactions that can help to solidify
group relationships.
Third, rhythmic activities are one of the few ways that humans interact with time. We sense light with our eyes and sound with our ears. But what organ senses the passage of time? There is none, yet we clearly do know that it is passing. Gibson concludes that time is an intellectual achievement, not a perceptual category. By observing how time appears to pass, Kramer explores the interactions between musical and absolute time, and shows how musical compositions can interrupt or reorder time as experienced. Indeed, Chapter 10 shows very concretely how such reorderings can be exploited as compositional elements. In arguing that music and time reveal each other, Langer states elegantly that music "makes time audible."
How do we learn about time? Children playing with blocks are learning about space and spatial relationships. Talking, singing, and listening to speech and music teach about time and temporal relationships. Jody Diamond's comments about gamelan music apply equally well to the study of rhythm in general:
Rhythm and Transforms focuses on a few of the simplest low level features of
musical rhythms such as the beat, the pulse, and the short
phrase, and attempts to create
algorithms that can emulate the ability of listeners to identify
these features. We take a strictly pragmatic viewpoint in trying to
relate things we can measure to things we can perceive,
and these correlations demonstrate neither cause nor effect.
The models are essentially mathematical tricks that may be applied to
sound waveforms, and the signal processing techniques emphasize
properties inherent in the signal prior to perceptual processing.
Nonetheless, as the discussion throughout this chapter suggests, the models are often inspired by the operation of the perceptual mechanisms (or, more accurately, guesses as to how the perceptual mechanisms might operate). For example, Chapters 5 - 6 - 7 explore mathematical models of periodicity detection. To make these applicable to musical signals, a kind of perceptual preprocessing is applied which extracts certain elementary features from the waveform. These derived quantities (like the feature vector of Fig. 9) feed the periodicity detection. Similarly, Chapter 7
describes an un-biological model of beat extraction from musical
signals based on a Bayesian model. These function in concert
with perceptually inspired features that are extracted
from the musical signal.
Several new and exciting applications open up once the foot-tapping machine of Fig. 2
can reliably locate the beats and basic periodicities
of a musical performance:
- Musical Editing:
- Identification of beat boundaries allows easy cut-and-paste operations when editing musical signals.
- An Intelligent Drum Machine:
- Typical drum machines are
preprogrammed to play rhythms at predefined speeds and
the performers must synchronize themselves
to the machine.
A better idea is to build a drum machine that can
"listen" to the music and follow the beat laid down by the musicians.
- External Synchronization:
- Beat identification
enables automated synchronization of the music with light
effects, video clips, or any kind of computer controlled system.
This may be especially useful in the synchronization of audio to
video in film scoring.
- A Tool for Disc Jockeys:
- Any identified levels of
metrical information (as fast as the tatum or as slow as the
phrase) can be used to mark
the boundaries of a rhythmic loop or to synchronize two or more
audio tracks.
- Music Transcription:
- Meter estimation is required
for time quantization, an indispensable subtask of transcribing a
musical performance into a musical score.
- Beat-Based Signal Processing:
- Beats provide natural boundaries in a musical signal which can be used to align a variety of signal processing techniques with the music. For example, filters, delays, echoes, and vibratos (as well as other operations) may exploit beat boundaries in their processing. This is discussed in Chapter 9 and appropriate
algorithms are derived.
- Beat-Based Musical Recomposition:
- Automatic identification of beat boundaries allows composers to easily work at the beat-level, an underexplored compositional level. Several surprising techniques are discussed and explored in Chapter 10.
- Information Retrieval:
- The standard way to search for
music (on the web, for instance) is to search metadata
such as file names, .mp3 ID tags, and keywords.
It would be better to be able to search using melodic or rhythmic
features, and techniques such as beat tracking may help to make
this possible.
- Score Following:
- In order for a computer program to
follow a live performer and act as a responsive accompanist, it
needs to sense and anticipate the location of
musically significant points such as beat boundaries and measures.
- Personal Conducting:
- Combining the beat tracking with an input device (such as a wand that could sense position and/or acceleration) and a method of slowing/speeding the sound (such as a phase vocoder, see Chapter 5), the listener can "conduct" the music at a desired tempo and with the desired expressive timing.
- Speech Processing:
- Rhythm plays an important role in
speech comprehension because it can help to segment connected speech
into individual phrases and syllables.
- Visualization Software:
- Designed to augment the musical
experience by presenting appropriate visuals on a screen,
visualization software is a popular adjunct to computer-based
music players. Many of these relate the visuals to the music
using the amplitude of the audio signal (so that, for instance,
louder passages move faster), the shape of the waveform, or
various transforms. It would clearly be preferable to also have
them able to synchronize to the beat of the piece.